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Abstract: We discuss a class of chain graph models for categorical variables 
defined by what we call a multivariate regression chain graph Markov prop- 
erty. First, the set of local independencies of these models is shown to be 
Markov equivalent to those of a chain graph model recently defined in the 
literature. Next we provide a parameterization based on a sequence of gener- 
alized linear models with a multivariate logistic link function that captures 
all independence constraints in any chain graph model of this kind. 
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1. Introduction 

Discrete graphical Markov models are models for discrete distributions repre- 
sentable by graphs, associating nodes with the variables and using rules that 
translate properties of the graph into conditional independence statements be- 
tween variables. There are several classes of graphical models; see [25] for a review. 
In this paper we focus on the class of multivariate regression chain graphs and we 
discuss their definition and parameterization for discrete variables. 

Multivariate regression chain graphs generalize directed acyclic graphs, which 
model recursive sequences of univariate responses, by allowing multiple responses. 
As in all chain graph models the variables can be arranged in a sequence of groups, 
called chain components, ordered on the basis of subject-matter considerations, 
and the variables within a group are considered to be on an equal standing as 
responses. The edges are undirected within the chain components, drawn as dashed 
lines, [7], or as bi-directed arrows, [22], and directed between components, all 
pointing in the same direction, i.e., with no chance of semi-directed cycles. One 
special feature of multivariate regression chain graphs is that the responses are 
potentially depending on all the variables in all previous groups, but not on the 
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other responses. Chain graphs with this interpretation were proposed first by Cox 
and Wermuth in [6], with several examples in [7, Chapter 5]. 

In the special case of a single group of responses with no explanatory variables, 
multivariate regression chain graphs reduce to covariance graphs i.e., to undirected 
graphs representing marginal independencies with the basic rule that if any its 
subgraph is disconnected, i.e., composed by completely separated sets of nodes, 
then the associated variables are jointly independent; see [12] and [19]. In the 
general case, the interpretation of the undirected graphs within a chain compo- 
nent is that of a covariance graph, but conditional on all variables in preceding 
components. For example, the missing edge (1,3) in the graph of Figure 1(b) is 
interpreted as the independence statement Xi_LLX3|X4, X5, compactly written in 
terms of nodes as 1_LL3|4,5. 

The interpretation of the directed edges is that of multivariate regression mod- 
els, with a missing edge denoting a conditional independence of the response on 
a variable given all the remaining potential explanatory variables. Thus, in the 
chain graph of Figure 1(a) the missing arrow (1,4) indicates the independence 
statement 1_LL4|3. The interpretation differs from that of classical chain graphs, 
(Lauritzen, Wermuth, Frydenberg, [18], [13], LWF for short) where the missing 
edges mean conditional independencies given all the remaining variables includ- 
ing the current responses. However, in studies involving longitudinal data, such 
as the prospective study of child development discussed in [4] where there are 
blocks of joint responses recorded at ages of 3 months, 2 years and 4 years, an 
analysis conditioning exclusively on the previous developmental states is typically 
appropriate. 

Recently, [9] distinguished four types of chain graphs comprising the classi- 
cal and the alternative (Andersson, Madigan, Perlman, [1]) chain graph models, 
called of type I and II, respectively. In this paper we give a formal definition of 
multivariate regression chain graph models and we prove that they are equiva- 
lent to the chain graph models of type IV, in Drton's classification [9]. Moreover, 
we provide a parameterization based on recursive multivariate logistic regression 
models. These models, introduced in [21, Section 6.5.4] and [14] can be used to 
define all the independence constraints. The models can be defined by an intu- 
itive rule, see Theorem 2, based on the structure of the chain graph, that can be 
translated into a sequence of explicit regression models. One consequence of the 
given results is that any discrete multivariate regression chain graph model is a 
curved exponential family, a result obtained in [9] with a different proof. 

2. Multivariate regression chain graphs 

The basic definitions and notation used in this paper closely follow [9], and they 
are briefly recalled below. A chain graph G = (V, E) is a graph with finite node 
set V = {l,...,d} and an edge set E that may contain either directed edges 
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or undirected edges. The graph has no semi-directed cycle, i.e., no path from a 
node to itself with at least one directed edge such that all directed edges have the 
same orientation. The node set of a chain graph can be partitioned into disjoint 
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Fig 1. Three chain graphs with chain components (a): T = {{1, 2}, {3, 4}}; (b): 7" = 
{{1, 2, 3}, {4, 5}}; (c): T — {{1, 2, 3, 4}, {5, 6}, {7}}. Dashed lines only occur within chain com- 
ponents. 

subsets T E T called chain components, such that all edges in each subgraph 
Gt are undirected and the edges between different subsets Ti ^ T2 are directed, 
pointing in the same direction. For chain graphs with the multivariate regression 
interpretation the subgraphs Gt within each chain component have undirected 

dashed ( ) or bi-directed {< — >) edges. The former convention is adopted in 

this paper. Thus, the chain graph of Figure 1(c) has three chain components, 
while the previous ones have two components. 

Given a subset ^4 C T of nodes within a chain component, the subgraph Ga, 
is said to be disconnected if there exist two nodes in A such that no path in Ga 
has those nodes as endpoints. In this case, A can be partitioned uniquely into a 
set of r > 1 connected components Ai,...,Ar. Otherwise, the subgraph Ga is 
connected. For example, in chain graph (c) of Figure 1, the subgraph Ga with 
A = {1,2,4} is disconnected with two connected components Ai = {1,2} and 
A2 = {4}. On the other hand the subgraph Ga with A = {1,2,3} is connected. 
In the remainder of the paper, we shall say for short that a subset A of nodes in 
a component is connected (resp. disconnected) if the subgraph Ga is connected 
(resp. disconnected). 

Any chain graph yields a directed acyclic graph D of its chain components 
having T as node set and an edge Ti — y-T2 whenever there exists in the chain 
graph G at least one edge v — >-w connecting a node v in Ti with node w mT2. In 
this directed graph, we may define for each T the set pa^(T) as the union of all 
the chain components that are parents of T in the directed graph D. This concept 
is distinct from the usual notion of the parents pag(A) of a set of nodes A in 
the chain graph, that is the set of all the nodes w outside A such that w — 
with V E A. For instance, in the graph of Figure 2(a), for T = {1,2}, the set 
of parent components is pa^(T) = {3,4,5,6}, whereas the set of parents of T is 
pac(T) = {3,6}. 

In this paper we start the analysis from a given chain graph G = {V, E) with 
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Fig 2. (a) A chain graph and (b) one possible consistent ordering of the four chain components: 
{1,2} < {3,4} -< {5,6} < {7,8}. In (b) the set of predecessors of T = {1,2} is pre(T) ^ 
{3, 4, 5, 6, 7, 8}, while the set of parent components of T is pa£)(T) = {3, 4, 5, 6}. 



an associated collection T of chain components. However in applied work, where 
variables are linked to nodes by the correspondence for t> G usually a 
set of chain components is assumed known from previous studies of substantive 
theories or from the temporal ordering of the variables. For variables within such 
chain components no direction of influence is specified and they are considered as 
joint responses, i.e. to be on equal standing. The relations between variables in 
different chain components are directional and are typically based on a preliminary 
distinction of responses, intermediate responses and purely explanatory factors. 
Often, a full ordering of the components is assumed based on time order or on a 
subject-matter working hypothesis; see [7]. 

Given a chain graph G with chain components (T|T G T), we can always define 
a strict total order -< of the chain components that is consistent with the partial 
order induced by the chain graph, i.e., such that if T -< T' then T ^ pa^(T'). 
For instance, in the chain graph of Figure 2(a) there are four chain components 
ordered in graph (b) as {1,2} -< {3,4} -< {5,6} -< {7,8}. Note that the chosen 
total order of the chain components is in general not unique and that another 
consistent ordering could be {1,2} -< {5,6} -< {3,4} -< {7,8}. 

In the remainder of the paper we shall assume that a consistent ordering -< 
of the chain components is given. Then, for each T, the set of all components 
preceding T is known and we may define the cumulative set pre(T) = Ut^t'^' of 
nodes contained in the predecessors of component T that we sometimes also call 
the past of T. The set pre(T) captures the notion of all the potential explanatory 
variables of the response variables within T. By definition, as the full ordering of 
the components is consistent with G, the set of predecessors pre(T) of each chain 
component T always includes the parent components pa£)(T). 

The following definition explains the meaning of the multivariate regression 
interpretation of a chain graph. 

Definition 1. Let G he a chain graph with chain components (T|T G T) and 
let pre(T) define an ordering of the chain components consistent with the graph. 



G. M. Marchetti and M. Lupparelli/ Chain graph models of multivariate regression type 5 



A joint distribution P of the random vector X obeys the (global) multivariate 
regression Markov property for G if it satisfies the following independencies. For 
all T E T and for all A C T: 

(mrI) if a is connected: AA. [pre(T) \ paQ{A)] \ pa^iA), 
(mr2) if a is disconnected with connected components Ai, . . . ,Ar : 
AiM ■■■MAr\ pre(T). 

Assuming that the distribution P has a density p with respect to a product 
measure, the definition can be stated by the following two equivalent conditions: 

Pa I pre(r) = Pa I pag (A) , (la) 
for all T and for all connected subset ACT. 

PA|pre(T) = H ^'^j I Pi-e(T) , (lb) 
j 

for all T and for all disconnected subset A G T with connected components 
-4j, i = l,...,r. 

In words, for any connected subset A of responses in a component T, its condi- 
tional distribution given the variables in the past depends only on the parents of A. 
On the other hand, if A is disconnected (i.e., the subgraph Ga is disconnected) the 
variables in its connected components Ai, . . . , A,., are jointly independent given 
the variables in the past. 

Remark 1. Definition 1 gives a local Markov property which always implies the 
following pairwise Markov property: for every uncoupled pair of nodes i, k then 

iMk\pTe{T), a i,k eT; iMk\pTe{T) \ {k}, if i e T, k e pTe{T). (2) 

In particular, two pairwise independencies iALk\ pre(T) and i-lLi\ pre(T) can occur 
only in combination with the joint independence iALk,i\ pre(r). This means that 
in the associated model the composition property is always satisfied; see [23]. 
Thus, even though we concentrate in this paper on the family of multinomial 
distributions which does not satisfy the composition property, the models in which 
MRl and mr2 hold have this property. 

Remark 2. One immediate consequence of Definition 1 is that if the probability 
density p{x) is strictly positive, then it factorizes according to the directed acyclic 
graph of the chain components: 

p{x) = n pMxp^^^T))- (3) 

This factorization property is shared by all types of chain graphs; see [25] and [9]. 
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Recently, [9] discussed four different block- recursive Markov properties for chain 
graphs, of which we discuss here those with the Markov property of type IV. To 
state it we need two further concepts from graph theory. Given a chain graph G, 
the set NbG(v4) is the union of A itself and the set of nodes w that are neighbours 
of A, i.e. coupled by an undirected edge to some node node v in A. Moreover, 
the set of non- descendants ndi:)(T) of a chain component T, is the union of all 
components T' such that there is no directed path from T to T' in the directed 
graph of chain components D. 

Definition 2. (Chain graph Markov property of type IV); [9]. Let G he a chain 
graph with chain components {T\T G T) and directed acyclic graph D of com- 
ponents. The joint probability distribution of X obeys the block-recursive Markov 
property of type IV if it satisfies the following independencies: 

(ivO) AM [ndz)(T) \ pa^(r)] | pa^(T) for all T E T 

(ivl) AM [pa^(T) \ paG(A)] | pa^l^) for all T eT, for all A C T 

(iv2) AM [T \ NhciA)] \ pa^(T) for all T e T for all connected subsets ACT. 

Then, we have the following result, proved in the Appendix. 

Theorem 1. Given a chain graph G, the multivariate regression Markov property 
is equivalent to the block-recursive Markov property of type IV. 

This result shows that the block-recursive property of a chain graph of type 
IV is in fact simplified by Definition 1. On the other hand Definition 1 depends 
only apparently on the chosen full ordering of the chain components, because the 
equivalent Definition 2 depends only on the underlying chain graph G. 

Example 1. The independencies implied by the multivariate regression chain 
graph Markov property are illustrated below for each of the graphs of Figure 1. 

Graph (a) represents the independencies of the seemingly unrelated regression 
model; see [6], [11]. For T = {1, 2} and pre(T) = {3, 4} we have the independencies 
1_IL4|3 and 2_U_3|4. Note that for the connected set A = {1, 2} the condition (mrI) 
implies the trivial statement 74_IL0| pre(T). 

In graph (b) one has T = {1, 2, 3} and pre(T) = {4, 5}. Thus, for each connected 
subset v4 C T, by (mrI), we have the non-trivial statements 

1X5 I 4; 2X4,5; 3X4 | 5; 1,2X5 | 4; 2,3X4 | 5. 

Then, for the remaining disconnected set A = {1,3} we obtain by (mr2) the 
independence 1X3 | 4,5. 

In graph (c), considering the conditional distribution pt\ pre(T) for T = {1,2,3,4} 
and pre(T) = {5, 6, 7} we can define independencies for each of the eight connected 
subsets of T. For instance we have 

1X5,6,7; 1,2X6,7 I 5; 1,2,3,4X7 | 5,6. 
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The last independence is equivalent to the factorization p = pi234|56 ■P56\7-P7 of the 
joint probability distribution according to the directed acyclic graph of the chain 
components. The remaining five disconnected subsets of T imply the conditional 
independencies 1,2_LL4|5,6,7 and 1_LL3,4 | 5,6,7. Notice that when in a component 
there are two uncoupled nodes, then there is a conditional independence given 
simply the common parents of the two nodes. For example, in graph (c), we have 
not only 1_LL3 | 5,6 but also 1_LL3 | 5. 

Remark 3. When each component T induces a complete subgraph Gt and, for 
all subsets A in T, the set of parents of A, pa.Q{A), coincides with the set of the 
parent components of T, pa.^(T), then the only conditional independence implied 
by the multivariate regression Markov property is 

AM [pre(T) \ pa^(T)] | paB(T), for all ACT, T e T. 

This condition is in turn equivalent just to the factorization (3) of the joint prob- 
ability distribution. 

Remark 4. In Definition 1, (mr2) is equivalent to imposing that for all T the 
conditional distribution pt\ pre{T), satisfies the independencies of a covariance graph 
model with respect to the subgraph Gr- 
in [19, Proposition 3], it is shown that a covariance graph model is defined by 
constraining to zero, in the multivariate logistic parameterization, the parameters 
corresponding to all disconnected subsets of the graph. In the following subsection 
we extend this approach to the multivariate regression chain graph models. 

3. Recursive multivariate logistic regression models 
3.1. Notation 

Let X = {Xy\v & V) he a. discrete random vector, where each variable has a fi- 
nite number of levels. Thus X takes values in the set X = HtiGvIl) ■ ■ ■ ,'>^v} whose 
elements are the cells of the joint contingency table, denoted by i = (zi, . . . , i^). 
The first level of each variable is considered a reference level and we consider also 
the set X* = Hi^gvI^) • • • i^v} of cells having all indices different from the first. 
The elements of X* are denoted by **. 

The joint probability distribution of X is defined by the mass function 

p{i) = P{X^ = i^,v = 1, . . . ,d) for alH G X, 

or equivalently by the probability vector p = {p{i),i G X). With three variables 
we shall use often pijk instead of p{ii, 12, is)- 

Given two disjoint subsets A and B of V, the marginal probability distribution 
of is pi^is) = ^jB=iBPU) where is a subvector of i belonging to the marginal 
contingency table Xg = n«;GB{l, - - - ,rv}- The conditional probability distributions 
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are defined as usual and denoted by p{iA\iB), for ia € and is ^ Ib or, 
compactly, by Pa\b- When appropriate, we define the set = nt;eB{2, • • • , ry}. 

A discrete multivariate regression chain graph model Pmr{G) associated with 
the chain graph G = {V, E) is the set of strictly positive joint probability distri- 
butions p{i), for z G X that obeys the multivariate regression Markov property. 
By Theorem 1 this class coincides with the set Piv(G) of discrete chain graph 
models of type IV. 

In the next subsection we define an appropriate parameterization for each com- 
ponent of the standard factorization 

p{i) = Yl p{iT I ipre(T)) (4) 

TeT 

of the joint probability distribution. Actually we define a saturated linear model 
for a suitable transformation of the parameters of each conditional probability 
distribution p^ix \ *pre(r))- 



3.2. Multivariate logistic contrasts 

The suggested link function is the multivariate logistic transformation; see [21, 
p. 219], [14]. This link transforms the joint probability vector of the responses 
into a vector of logistic contrasts defined for all the marginal distributions. The 
contrasts of interest are all sets of univariate, bivariate, and higher-order contrasts. 
In general, a multivariate logistic contrast for a marginal table is defined by 
the function 



V 



i\) = E (-1)'^^^' ^ogp{i:, 1a\s), for r G X^, 



(5) 



sCA 



where the notation 1^4 \ s| denotes the cardinality of set A \ s. The contrasts 
for a margin A are denoted by r]^^^ and the full vector of the contrasts for all 
nonempty margins A O V are denoted by 77. The following example illustrates 
the transformation for two responses. 

Example 2. Let pij, for i = 1,2, j = 1,2,3 be a joint bivariate distribution 
for two discrete variables Xi and X2. Then, the multivariate logistic transform 
changes the vector p of probabilities, belonging to the 5-dimensional simplex, into 
the 5x1 vector 



( V^'^ \ 



(2) 



where r] 



(1) 



1 P2+ (2) 

fog , ' 



A P+2\ 

log ' 

P+1 
1 P+3 

log 

\ P+1 J 



(12) 



A PllP22\ 

log > 

P21P12 

log 

V P21P13/ 



where the -|- suffix indicates summing over the corresponding index. Thus, the 
parameters 77*^^^ and 77^^^ are marginal baseline logits for the variables Xi and X2, 
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while 77^^^^ is a vector of log odds-ratios. The definition used in this paper uses 
baseline coding, i.e., the contrasts are defined with respect to a reference level, by 
convention the first. Therefore the dimension of the vectors iri^^\ rj^'^^ and r/*^^^) are 
the dimensions of the sets X*, X2 and Xj^g- Other coding schemes can be adopted, 
as discussed for instance in [24] and [2]. 

Remark 5. This transformation for multivariate binary variables is discussed in 
[14], where it is shown that the function from p to 77 is a smooth {C°°) one-to- 
one function having a smooth inverse, i.e. it is a diffeomorphism; see also [3]. For 
general discrete variables see [19]. The parameters are not variation- independent, 
i.e. they do not belong to a hyper-rectangle. However, they satisfy the upward 
compatibility property, i.e., they have the same meaning across different marginal 
distributions; see [14] and [19, Proposition 4]. Often the multivariate logistic link 
is written as 

r] = C\og{Mp), (6) 

where C and M are suitable Kronecker products of contrast and marginalization 
matrices respectively. For the explicit construction of these matrices see [2]. 

3.3. Saturated model 

We specify the dependence of the responses in each component T on the variables 
in the past by defining a saturated multivariate logistic model for the conditional 
probability distribution pt\ pre(T) • The full saturated model for the joint probability 
p then follows from the factorization (4). 

For each each covariate class ipre(T) ^ ^prc(T) let p(*pre(T)) be the vector with 
strictly positive components p(iT|*pre(T)) > for ix G X^. Then consider the 
associated conditional multivariate logistic parameters »7(ipre(T)) defined using the 
link function (6). Notice that this vector is composed of contrasts T7*-'^''(*pre(T)) for 
all nonempty subsets A of T. Then we express the dependence of each of them on 
the variables in the preceding components by a complete factorial model 

Here the vectors f3l^\ib) have dimensions of the setsX^, and are defined according 
to the baseline coding, and thus vanish when at least one component of % takes 
on the first level. Again, here different codings may be used if appropriate. Often 
is useful to express (7) in matrix form 

^(A) ^ z^^^(3^^\ (8) 

where r/*^"^^ is the column vector obtained by stacking all vectors ?7''^^(*pre(T)) for 
*pre(T) G 2pre(T) , Z^^'^ is a fuU-rauk design matrix and is a parameter vector. 
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Example 3. Suppose that in Example 2 the responses Xi and X2 depend on 
two binary explanatory variables X3 and X4, with levels indexed by k and £, 
respectively. Then, the saturated model is 

ri^^\kj) = (3[^^+(i\t\k)+(3[^\l)+(3\^\kJ), k,e = 1,2, 

for A = {1}, {2}, {12}. The explicit form of the matrix Z^"^'' in equation (8) is, 
using the Kronecker product ® operator, 

i.e., a matrix of a complete factorial design matrix, where / is an identity matrix 
of order equal to the common dimension of each ri^'^\k^tj. Following [21, p. 222] 
we shall denote the model, for the sake of brevity, by a multivariate model formula 

Xi : X3 * Xi, X2 : X3 * X4; X12 : X3 * X4 

where X3 * X4 = X3 + X4 + X3.X4 is the factorial expansion in Wilkinson and 
Rogers' symbolic notation; [26]. 

When we need to express the overall 1-1 smooth transformation of the condi- 
tional probability vectors p(zpre(T)), denoted collectively by prp, into the logistic 
and regression parameters we introduce the vectors rjrp and f3rp obtained by con- 
catenating the parameters rj^^^ and f3^^\ respectively, for all nonempty subsets A 
of T, writing 

t7t = ZTf3T, (9) 
where = diag(Z*-"^'') is a full rank block-diagonal matrix of the saturated 
model, and 

CTlogiMTPr) = Vt, (10) 
where Ct and Mt are suitable overall contrast and marginalization matrices. 

4. Discrete multivariate regression chain graph models 
4-1- Linear constraints 

A multivariate regression chain graph model is specified by zero constraints on 
the parameters (3rp of the saturated model (9). We give first an example and then 
we state the general result. 

Example 4. Continuing the previous example for the chain graph G of Fig- 
ure 1(a), we shall require that Xi depends only on X3 and X2 depends only on 
X4. Therefore, we specify the model 

ri^'Hkj) = p^;^+pi'\k), 

V-)(^,£) = d''^+f3r\k)+(3r\i)+(3if\kj). 
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with a corresponding multivariate model formula 

Xi : X3, X2 : X12 : * X4. 

The reduced model satisfies the two independencies 1JL4|3 and 2JL3|4 because 
the first two equations are equivalent to pi\zi = Pi\z and ]?2|34 = Pi|4, respectively. 
The log odds-ratio between Xi and X2, on the other hand, depends in general on 
all the combinations {k,€) of levels of the two explanatory variables. 

The following Theorem, proved in the Appendix states a general rule to parametrize 
any discrete chain graph model of multivariate regression type. 

Theorem 2. Let G he a chain graph and let pre(T) he a consistent ordering of 
the chain components T E T ■ A joint distrihution of the discrete random vector 
X helongs to Pmr(G') if and only if, in the multivariate logistic model (7), the 
parameters f3\f'\ib) = 0, if, G X5, whenever 

A is connected and b C pre(T) \ pa,Q{A), (Ha) 
A is disconnected and b C pre(T), (Hb) 

for all A (1 T and for all T G T. 

Notice that equations (11a) and (lib) correspond to conditions (la) and (lb) 
respectively of Definition 1. Thus the multivariate regression chain graph model 
turns out to be ?7*-^^(*prc(T)) = ^bCpa.ij{A)f3h^\ib) if A is connected and if A is 
disconnected. In matrix form we have a linear predictor 

Vt = Zrf3, (12) 

where is the matrix of the reduced model obtained by removing selected 
columns of Zt, and /S^ are the associated parameters. 

The proof of Theorem 2 is based on a basic property of the regression parameters 
f3l^\ib) of model (7), i.e., that they are identical to log-linear parameters defined 
in selected marginal tables. Specifically, each (Slf^ (if,) coincides with the vector of 
log-linear parameters A^f of order AU b in the marginal table A U pre(T). See 
Lemma 2 in the Appendix. 

Theorem 2 shows also that the chain graph model Pmr(G') is defined by a set of 
linear restrictions on a multivariate logistic parameterization and thus is a curved 
exponential family. 

Example 5. From Theorem 2, the chain graph model of Figure 1(b) is defined 
by the equations 

77(i)(fc,0 = /3« r/(2)(fc,/) = /3f\ ,?(3)(A;,0 = /sf^ + 

^(123) (^^ ^) ^ ^(123) ^ ^(123) ^ ^(123) ^ ^ ^^^^^ 
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and by the multivariate logistic model formula 

Xi : X4, X2 : 1, X3 : X5, X12 : X4, X13 : 0, X23 : X5, X123 : X4 * X5. 

Notice that the marginal logit of X2 does not depend on the variables X4, X5. This 
is denoted by X2 : 1. On the other hand, the missing edge (1, 3) with associated 
independence liL3|4,5, implies that the bivariate logit between Xi and X3 is 
zero, denoted by model formula X13 : 0. The above equations reflect exactly the 
independence structure encoded by the multivariate regression Markov property 
but leave a complete free model for the three variable logistic parameter r]^^'^^\ 

Table 1 lists the parameters (and their log-linear interpretations) of the satu- 
rated model. The nonvanishing parameters of the chain graph model are in bold- 
face. The shaded portion of the table indicates the interactions of order higher 
than two. Therefore, the chain graph model contains 6 parameters in the shaded 
area that have a more complex interpretation and that are not strictly needed to 
define the independence structure. This suggests to consider as a starting model 
a multivariate logistic regression model with no parameters of log-linear order 
higher than two and then using a backward selection strategy to test for the in- 
dependencies. Some adjustment of the procedure is needed to include selected 
higher-order interactions when needed. Notice also that the parameters in Ta- 
ble 1 form a marginal log-linear parameterization in the sense of Bergsma and 
Rudas, [3], a result that can be proved for any discrete multivariate regression 
chain model. 

A parallel multivariate logistic parameterization for the model Piv(G') can be 
obtained from Definition 2 and the associated characterization in terms of den- 
sities of Lemma 1, in the Appendix. In this case, using the factorization (3), the 
multivariate logistic models can be defined in the lower-dimensional conditional 
distributions PT\pa.jy{T)- Therefore we state the following Corollary. 

Table 1 

Marginal log-linear parameters of the saturated model for a discrete multivariate logistic model 
with three responses and two explanatory variables. Each row lists log-linear parameters defined 
within a marginal table indicated in the last column. In boldface the nonzero terms of the chain 
graph model of Example 5. The shaded part of the table collects the interactions of order higher 

than two. 







Parameters 






Logit 


const. 


4 


5 


45 


Margin 


1 


1 


14 


15 


145 


145 


2 


2 


24 


25 


245 


245 


3 


3 


34 


35 


345 


345 


12 


12 


124 


125 


1245 


1245 


13 


13 


134 


135 


1345 


1345 


23 


23 


234 


235 


2345 


2345 


123 


123 


1234 


1235 


12345 


12345 
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Corollary 1. The joint probability distribution of the random vector X belongs 
to Piv(G') if and only if it factorizes according to equation (3), and for each con- 
ditional distribution p{iT\ipa.jy{T)) , forT G T, the multivariate logistic parameters 
are 

M)/ • N /EfcCpacCA) ^b^\u), for all connected ACT, 

I 0, for all disconnected ACT. 

The class of models defined in Remark 3, corresponding exactly to the factor- 
ization (3), all the independencies are obtained by setting pag(yl) = pa^(T) for 
all A C r in equation (11a). 



4-2. Likelihood inference 

The estimation of discrete multivariate regression chain models can be carried out 
by fitting separate multivariate logistic regression models to each factor PT|pre(T) 
of the decomposition (4). Specifically, given a block T of responses, and the group 
of covariates pre(r) we consider the table of frequencies for each covariate 
class k, where k = 1,...,K is an index numbering the cells of the marginal 
table Xpi.e(r)- Then we assume that each Yk ~ M{nk,Pk) is multinomial with 
Pk = P(*prc(T))- Given K independent observations {Yi,ni) . . . ,{Y k.^k) the 
vector Y = vec{Yi, . . . ,Y x) has a product-multinomial distribution and the 
log-likelihood is 

l{uj) = y^uj - 1^ exp{uj) (14) 

where uj = \ogE{Y) = log/x, and CT^og^Mxl-i) = Zrf3^, from (12). The maxi- 
mization of this likelihood under the above linear constraints has been discussed 
by several authors; see [14], [16], [3], [2], among others. 

Example 6. We give a simple illustration based on an application to data from 
the U.S. General Social Survey [8], for years 1972-2006. The data are collected on 
13067 individuals on 5 variables. There are three binary responses concerning in- 
dividual opinions (1 = favor, 2 = oppose) on legal abortion if pregnant as a result 
of rape. A, on death penalty for convicted of murder, C, and on the introduction 
of police permits for buying guns, G. The potentially explanatory variables con- 
sidered are J, job satisfaction (with three levels 1 = very satisfied, 2 = moderately 
satisfied, 3 = a little or very dissatisfied) and S, gender (l=male, 2=female). We 
can interpret responses G and C as indicators of the attitude towards individual 
safety, while C and A as indicators of the concern for the value of human life, 
even in extreme situations. 

The two explanatory variables turned out to be independent (with a likelihood- 
ratio test statistic of w = 0.79, 1 d.f.). Hence, we concentrate on the model model 
for the conditional distribution Pgca\js- Here the saturated model (9) has 42 
parameters and the structure of the parameters is that of Table 1, with the only 
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Table 2 

Multivariate regression chain graph model selection for GSS data. Model (1) is the pure 
independence model of Fig. 3 for pqca\js- Models (2)-(7) are fitted during the .suggested model 
selection procedure. On the right, the fitted parameters for the best selected model. 



Model for Pgca\js 


Deviance 


d.f. 


Logit 


const. 




•/full 


Si 


(1) GALA\J,S and GALJ\S 


12.84 


10 


G 


0.766 






0.766 


(2) No 5-factor interaction 


0.49 


2 


C 


1.051 


0.150 


0.257 


-0.458 


(3) + no 4-factor interactions 


5.59 


11 


A 


1.826 


-0.033 


-0.245 


-0.172 


(4) + no 3-factor interactions 


30.16 


27 


GC 


-0.303 








(6) + Delete edge GA 


33.38 


28 


CA 


0.557 








(7) + Delete edge GJ 


34.25 


30 













modification of tlie dimensions of the interaction parameters involving the factor 
J, with three levels. We describe a hierarchical backward selection strategy. For 
this, we examine first the sequence of models obtained by successively removing 
the higher-order interactions, see Table 1. Then, we drop some of the remaining 
terms to fit independencies. 

The results are shown in Table 2. The model with no interactions of order 
higher then three has a deviance of 30.16 with 27 degrees of freedom adequate. 
From the edge exclusion deviances, we verify that we can remove the edges GA 
{w = 33.38 - 30.16 = 3.22, 1 d.f.), and GJ {w = 34.25 - 33.38 = 0.87, 2 d.f.) 
The final multivariate regression chain graph model is shown in Fig. 3(a) has a 
combined deviance of 34.25 + 0.79 = 35.04 on 32 degrees of freedom. 

Notice that the model includes independence and non-independence constraints, 
the latter following for our preference for model with all interpretable parameters. 
The chain graph model corresponding exactly to the implied independencies has 
far more parameters, with a deviance of 12.84 + 0.79 = 13.63 against 12 degrees of 
freedom. While this model is adequate the chosen model has a simpler interpreta- 
tion. The fitted parameters are shown in Table 2 right. The first three rows give 
the parameters of three univariate logit regressions for being in favor of the issue. 
Juidr, •/full measure the effect of moderate and full job satisfaction, respectively, 
with respect to a baseline level of no statisfaction and S'f is the effect of females. 
Thus the effect of increased job satisfaction whatever the gender, is to increase the 



G G 




Fig 3. (a) The multivariate regression chain graph model fitted to GSS data (Deviance — 13.63, 
d.f. = 12) . The final fitted model including further non-independence constraints has a Deviance 
= 35.04 on 32 d.f. (h) the best fitting LWF chain graph model (Deviance ~ 12.81, d.f. ~ 18). 
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probability of being in favor of capital punishment, and against abortion. Women, 
are more favorable than males to gun's regulation and against death penalty and 
abortion, other thing being equal. The negative residual association between G 
and C and the positive one between C and A having accounted for gender and 
job satisfaction, are as expected. As a comparison, in this example, a best fitting 
classical chain graph model with LWF interpretation, has one additional edge, as 
shown in Figure 3. The multivariate regression chain graph has a simpler interpre- 
tation in terms of three additive logistic regressions and two residual associations 
interpretable as deriving from two latent variables. 
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Appendix: Proofs 

We shall assume for the joint distribution the existence of a density with respect to a 
product measure. Proofs using only basic properties of conditional independence can 
also be given, but are omitted for brevity. 

Lemma 1. The block-recursive Markov property of type IV is equivalent to the following 
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three statements: for all T T 

PT|pre(T) =PT|pao(T), (15a) 

Pa\ pao(T) = Pa\ paG(A) for all connected ACT, (15b) 

PApa^CT) = IljPA,\pa.j,{T) for all disconnected ACT, (15c) 



where Aj, j = 1, . . . ,r are the connected components of A, if disconnected. 

Proof. Condition (ivO) states that the joint probability distribution obeys the local 
directed Markov property relative to the directed graph D of the chain components. 
Then, using the equivalence of the local and well-ordered local Markov property in 
directed graphs applied to the graph of the components as discussed in [10, Appendix A] , 
(ivO) turns out to be equivalent to (15a) for any ordering of the components consistent 
with the chain graph. Moreover, condition (iv2) has been proved by [12] to be equivalent 
to the joint independence (15c). Statement (ivl) implies (15b) but in fact it can be 
restricted to connected subsets A because for disconnected subsets it follows from (15c) 
and from (15b) restricted to connected sets. If A is disconnected (15c) implies 

PA|pao(T) = IljPA,\p^^(T) = IljPA,\pe.a{Aj) = Uj PAj\pe.a{A) (16) 

by applying (15b) to the connected sets Aj and noting that paL,Q{Aj) C paQ(j4). There- 
fore, PA\pa.o(T) = PAIpagCA) and equation (ivl) follows. □ 

Then, we are ready to prove that the multivariate regression Markov property is 
equivalent to the above block-recursive Markov property. 

Proof of Theorem 1. We establish the equivalence of (la) and (lb) with (15a), (15b), 
(15c) of Lemma 1. 

(Definition 1 implies Definition 2.) Equation (la) implies PA\pre(T) = PA\pa.^{T) for all 
connected A because paQ{A) C pa£)(r). Thus (la) implies (15b) and (15a) for A = T, 
because any Gt is connected, by definition. Thus, if A is disconnected (lb) gives 

PA\pve(T) = IljPAj\pre{T) = Uj PAjlpa^iT) = PA\ pa. jy{ A) 

and (15c) follows. 

(Definition 2 implies Definition 1.) Statement (15a) implies for ACT, that pa\ pre(T) = 
PA\pao{T) - Thus for all connected A (15b) implies p^j pre(T) = PA\paa{A), i-e-, (la). More- 
over if A C T is disconnected (15c) implies 

PA\pre{T) =PA|pao(T) = Uj PA,\pai,{T) = 11 j PA, | pre^ (T) , 

i.e., (lb). □ 

Given a subvector Xm of the given random vector X, the log- linear expansion of its 
marginal probability distribution pM is 

logPMiiAi) = E ^sHis), (17) 

sCM 

where Xg^{is) defines the 'interaction' parameters of order in the baseline parame- 
terization, i.e. with the implicit constraint that the function returns zero whenever at 
least one of the indices in ig takes the first level. 
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Lemma 2. // ^/^'^■'(^Al*pre(r)) the multivariate logistic contrast of the conditional 
probability distribution PA|pre(T) for A. subset ofT, then, with B = pre(T), 

r^^^\i'A\iB) = Y.>^M{i*A.h) (18) 

where A^f (i^, ib) are the log-linear interaction parameters of order A\Jh in the marginal 
probability distribution pAB- 

Proof. First note that the multivariate logistic contrasts Vi^^^^\iWiB) can be written 

= Y.i-l)\''\'\\ogpAB{K.iB,lA\s)- (19) 

Then, we express the logarithm of the joint probabilities pab as sum of log-linear inter- 
actions using (17) 

log pAB{i*,iB,'i-A\s) = I] I] A^6^(^ans,la\s,*b) = 1111 >^ab^ {il^b) ■ 

aCAbCB aCsbCB 

Therefore, by substitution into equation (19) we get 

sCA aCsbCB 

= E E(-i)'^^^'E^a5^(^:'^^) = E<(^A,4), 

bCB sCA aCs bCB 

where the last equality is obtained using Mobius inversion, see [17, p. 239, Lemma 
A.2]). □ 

Lemma 2 is used in the proof of Theorem 2 given below. 

Proof of Theorem 2. If (11a) holds for any chain component T, then for any con- 
nected set yl C r, T7^"^H*pre(T)) is a function of ipag(T) only. Therefore, using the dif- 
feomorphism and the property of upward compatibility discussed in Remark 5, the 
conditional distribution pi.e(T) coincide with PA\paQ(A) and condition (mrI) holds. 

Conversely, if condition (mrI) holds and PA\pre{T) = PA\pa,Q{A)y for all connected 
subsets A of T, then the components of '?^^H*pre(T)) are 

^^^^(»aIvc(t)) = E(-i)'^^''iog^^(*^i^v I Vg^) 

sCA 

= T.>'AbiiA,h), withS = paG(T) 

bCB 

by Lemma 2, and thus (11a) holds with /^['^^(if,) = A^f (i;,) where A^f (ife) denotes the 
vector of log-linear parameters A^^(i^,if,) for all G I\. 

Condition (mr2) of Definition 1 is equivalent to imposing that, for any chain com- 
ponent r, the conditional distribution pi.e(T) satisfies the independence model of a 
covariance sub-graph Gt- In [15] and [19] is proved that, given a joint distribution p-p, a 
covariance graph model is satisfied if and only if in the multivariate logistic parameteri- 
zation r]rp, r/^^^ = for all disconnected sets ACT. Therefore, extending this result to 
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the conditional distribution PT\pvc(T) ^'^'^ considering the diffeomorphism (7), condition 
(mr2) holds if and only if ri^^'^^is) = for every disconnected set ACT. Following the 
factorial model (7), (3l^\ib) = with b C pre(T), for each disconnected subset A of T. 
Notice that, by Lemma 2, pi'^\ib) = A^f (ife) = 0, with b C pre(T). □ 



